342 research outputs found

    The Use of Optimal Cue Mapping to Improve the Intelligibility and Quality of Speech in Complex Binaural Sound Mixtures.

    Get PDF
    A person with normal hearing has the ability to follow a particular conversation of interest in a noisy and reverberant environment, whilst simultaneously ignoring the interfering sounds. This task often becomes more challenging for individuals with a hearing impairment. Attending selectively to a sound source is difficult to replicate in machines, including devices such as hearing aids. A correctly set up hearing aid will work well in quiet conditions, but its performance may deteriorate seriously in the presence of competing sounds. To be of help in these more challenging situations the hearing aid should be able to segregate the desired sound source from any other, unwanted sounds. This thesis explores a novel approach to speech segregation based on optimal cue mapping (OCM). OCM is a signal processing method for segregating a sound source based on spatial and other cues extracted from the binaural mixture of sounds arriving at a listener's ears. The spectral energy fraction of the target speech source in the mixture is estimated frame-by-frame using artificial neural networks (ANNs). The resulting target speech magnitude estimates for the left and right channels are combined with the corresponding original phase spectra to produce the final binaural output signal. The performance improvements delivered by the OCM algorithm are evaluated using the STOI and PESQ metrics for speech intelligibility and quality, respectively. A variety of increasingly challenging binaural mixtures are synthesised involving up to five spatially separate sound sources in both anechoic and reverberant environments. The segregated speech consistently exhibits gains in intelligibility and quality and compares favourably with a leading, somewhat more complex approach. The OCM method allows the selection and integration of multiple cues to be optimised and provides scalable performance benefits to suit the available computational resources. The ability to determine the varying relative importance of each cue in different acoustic conditions is expected to facilitate computationally efficient solutions suitable for use in a hearing aid, allowing the aid to operate effectively in a range of typical acoustic environments. Further developments are proposed to achieve this overall goal

    Accelerating Object-Sensitive Pointer Analysis by Exploiting Object Containment and Reachability (Artifact)

    Get PDF

    Tuning Pre-trained Model via Moment Probing

    Full text link
    Recently, efficient fine-tuning of large-scale pre-trained models has attracted increasing research interests, where linear probing (LP) as a fundamental module is involved in exploiting the final representations for task-dependent classification. However, most of the existing methods focus on how to effectively introduce a few of learnable parameters, and little work pays attention to the commonly used LP module. In this paper, we propose a novel Moment Probing (MP) method to further explore the potential of LP. Distinguished from LP which builds a linear classification head based on the mean of final features (e.g., word tokens for ViT) or classification tokens, our MP performs a linear classifier on feature distribution, which provides the stronger representation ability by exploiting richer statistical information inherent in features. Specifically, we represent feature distribution by its characteristic function, which is efficiently approximated by using first- and second-order moments of features. Furthermore, we propose a multi-head convolutional cross-covariance (MHC3^3) to compute second-order moments in an efficient and effective manner. By considering that MP could affect feature learning, we introduce a partially shared module to learn two recalibrating parameters (PSRP) for backbones based on MP, namely MP+_{+}. Extensive experiments on ten benchmarks using various models show that our MP significantly outperforms LP and is competitive with counterparts at less training cost, while our MP+_{+} achieves state-of-the-art performance.Comment: Accepted to ICCV 2023; Project Page: https://github.com/mingzeG/Moment-Probin

    Trainability Analysis of Quantum Optimization Algorithms from a Bayesian Lens

    Full text link
    The Quantum Approximate Optimization Algorithm (QAOA) is an extensively studied variational quantum algorithm utilized for solving optimization problems on near-term quantum devices. A significant focus is placed on determining the effectiveness of training the nn-qubit QAOA circuit, i.e., whether the optimization error can converge to a constant level as the number of optimization iterations scales polynomially with the number of qubits. In realistic scenarios, the landscape of the corresponding QAOA objective function is generally non-convex and contains numerous local optima. In this work, motivated by the favorable performance of Bayesian optimization in handling non-convex functions, we theoretically investigate the trainability of the QAOA circuit through the lens of the Bayesian approach. This lens considers the corresponding QAOA objective function as a sample drawn from a specific Gaussian process. Specifically, we focus on two scenarios: the noiseless QAOA circuit and the noisy QAOA circuit subjected to local Pauli channels. Our first result demonstrates that the noiseless QAOA circuit with a depth of O~(logn)\tilde{\mathcal{O}}\left(\sqrt{\log n}\right) can be trained efficiently, based on the widely accepted assumption that either the left or right slice of each block in the circuit forms a local 1-design. Furthermore, we show that if each quantum gate is affected by a qq-strength local Pauli channel with the noise strength range of 1/poly(n)1/{\rm poly} (n) to 0.1, the noisy QAOA circuit with a depth of O(logn/log(1/q))\mathcal{O}\left(\log n/\log(1/q)\right) can also be trained efficiently. Our results offer valuable insights into the theoretical performance of quantum optimization algorithms in the noisy intermediate-scale quantum era
    corecore